In the theory video and associated pdf, it has been shown that the equation of a random walk is:
\[
x_t = \delta + x_{t-1} + w_t
\]
Where:
\(\delta \in \mathbb{R}\) is the constant delta added at each time step that results in an overall drift in the project.
\(x_{t-1}\) is the value of the process at the previous time step
\(w_t\) is a white noise process with 0 mean and constant variance \(\sigma_w^2\)
In the theory part of this session we have also proved by recursive substitution that this is equivalent to
\[
x_t = x_0 + \delta t + \sum_{j=1}^{t}w_t \underset{\substack{\uparrow \\ \text{set} \\ x_0 = 0}}{=} \delta t + \sum_{j=1}^{t}w_t
\] where in the last step we have set \(x_0\), the initial value of the random walk, to be 0. This has been done for simplicity and can always be attained by shifting the coordinate system. Thus, there is no loss of generality in this formulation.
Define simulating function
The function below simulates a random walk process. This function takes the following arguments:
n: number of timesteps to be simulated
delta: \(\delta \in \mathbb{R}\) added at each timestep that results in the overall drift in the random walk.
x_0 \in \mathbb{R}: initial value for the random walk process, in case we want it to start at a value different from 0.
noise_var: this is \(\sigma_w^2\), the variance of the white noise process governing the random walk.
# Define the functionsimulate_Rwalk <-function(n, delta, x_0, noise_var) {# Simulate white noise w_t =rnorm(n, sd=sqrt(noise_var))# Calculate the comulative effect of the# white noise process at each time step w_cumsum =cumsum(w_t)# Calculate cumulative drift at each time step drift = delta *1:n# Calculate x_t at each time step xt = x_0 + w_cumsum + drift# Create dataframe to return the results df <-data.frame(t =1:n,w_t = w_t,delta =rep(delta, n),drift = drift,w_cumsum = w_cumsum,x_t = xt )return(df)}
Single-run simulation: Random Walk (without drift)
Running the simulation
The code below runs a single simulation where we have set the parameters as indicated below
# Set seed for reproducibilityset.seed(160)# Simulate data using the functionn <-200# 200 timestepsdelta <-0# 0 drift (pure Random walk)x_0 <-0noise_var <-1df_rw <-simulate_Rwalk(n = n, delta = delta, x_0 = x_0, noise_var = noise_var)
To that code we now add the moving standard deviation applied using centered windows of 7 points.
Much like we could compute the average of the points in a window to obtain the moving average, we can compute the moving standard deviation.
The resulting moving standard deviation is local measure of the variability of the data locally.
We choose a window of 7 points as an example. If we think of this simulated data as closing prices of a stock market asset at the end of the day, this would correspond to the average of the prices in a one-week window (not necessarily a calendar week, because trading days do not match calendar days).
Now the code below depicts the generated random walk along with:
a blue dashed line indicating the overall drift. In this case we set \(\delta=0\), so there is no drift. As a result this line is horizontal
a red dashed line corresponding to the moving standard deviation. As you can see, it remains fairly constant. This indicates that, locally, the variance of this process is constant. In stock-market jargon these measures of local variance are sometimes referred to as volatility. There are much more ellaborate indicators of volatility. Here we have purposefully used this simple one.
# Plottingp <-# Specify global aestheticsggplot(df_rw, aes(x = t)) +# Plot x_t with line and pointsgeom_line(aes(y = x_t), color ="black") +geom_point(aes(y = x_t), color ="black") +# Plot drift (cumulative delta over time)geom_line(aes(y = x_0 + drift), linetype ="dashed", color ="blue") +# geom_line(aes(y = w_avg), color = "purple", linetype = "dashed") +# Plot rolling standard deviationgeom_line(aes(y = rolling_sd), color ="red", linetype ="dotted") +labs(title ="Simulated Random Walk with Drift and Rolling Std Dev", x ="Time", y ="Value")p
As we had shown in the theory, a random walk process is not stationary. We see that this random walk, even though it has no drift, it has a trend. The fact that this specific instance of a random walk has this trend is an example that shows that, in general, a random walk process is not stationary.
We had proved this in a stronger manner in the theory lesson by computing directly the autocovariance of the random walk process.
Even though the local standard deviation remains fairly constant, for the overall process the variance (the total range of values) grows with time, as we had shown in the theory, We proved that \(Var(x_t)=t\,\sigma_w^2\).
Remember one way in which we had proven it was through the following formula: \(x_t = \sum_{j=1}^{t}w_t\) (case without drift). This means that \(x_t\) is the sum of uncorrelated normal distributions, which is itself also normal. Its variance is \(t \, \sigma_w^2\) because the formula for the variance of the sum uncorrelated independent random variables indicates so (see theory part, where we applied this formula and also proved it).
Single-run simulation: Random Walk with drift
Running the simulation
The code below runs a single simulation where we have set the parameters as indicated below
# Set seed for reproducibilityset.seed(160)# Simulate data using the functionn <-200# 200 timestepsdelta <-0.35# 0 drift (pure Random walk)x_0 <-0noise_var <-1df_rw_drift <-simulate_Rwalk(n = n, delta = delta, x_0 = x_0, noise_var = noise_var)# Compute 7-point centered moving standard deviationdf_rw_drift$rolling_sd <- slider::slide_dbl(.x = df_rw_drift$x_t, .f = sd, .before =3, .after =3, .complete =TRUE)
Visualization
See the corresponding section for the Random Walk without drift
# Plottingp_drift <-# Specify global aestheticsggplot(df_rw_drift, aes(x = t)) +# Plot x_t with line and pointsgeom_line(aes(y = x_t), color ="black") +geom_point(aes(y = x_t), color ="black") +# Plot drift (cumulative delta over time)geom_line(aes(y = x_0 + drift), linetype ="dashed", color ="blue") +# geom_line(aes(y = w_avg), color = "purple", linetype = "dashed") +# Plot rolling standard deviationgeom_line(aes(y = rolling_sd), color ="red", linetype ="dotted") +labs(title ="Simulated Random Walk with Drift and Rolling Std Dev", x ="Time", y ="Value")p_drift
Since we have used an identical seed, the only difference between both processes is the presence of a constant drift, as the blue dashed line indicates. At each point in time, a constant \(\delta\) is added as indicated by the governing equation:
\[
x_t = \delta t + \sum_{j=1}^{t}w_t
\]
Multi-run simulation - Random walk (withot drift)
The code defines the function simulate_n_Rwalks. This function simply uses repeatedly simulate_Rwalk to produce a multi-run simulation of random walks. It returns a dataframe containing an identifier for each simulation and the results of each simulation.
simulate_n_Rwalks <-function(n_sims, n, delta, x_0, noise_var) { all_sims <-list() # list object to hold all dataframesfor (i in1:n_sims) { sim_data <-simulate_Rwalk(n, delta, x_0, noise_var) sim_data$sim_ID <- i all_sims[[i]] <- sim_data }# Combine all simulations into one data frame all_data <-bind_rows(all_sims) all_data <-select(all_data, sim_ID, everything())return(all_data)}
10 runs
Let us use the previous function to produce 10 runs.
This would be like getting 10 realizations of the time series, something we very rarely get in real-world applications unless we are conducting a statistical experiment.
# Set seed for reproducibilityset.seed(160)# Number of simulationsn_sims =10# Example for 10 simulationsn <-200delta <-0x_0_<-0noise_var <-1run_10_nodrift <-simulate_n_Rwalks(n_sims, n, delta, x_0, noise_var)# Exctract drift (same drift for every sim)drift <- run_10_nodrift %>%filter(sim_ID ==1) %>%pull(drift)
The graph below depits the result of each simulation. For clarity, each simulation has been assigned a specific color:
# Plottingp_10sims <-ggplot(run_10_nodrift, aes(x = t, y = x_t, group = sim_ID, color =as.factor(sim_ID))) +# Plot all simulationsgeom_line(aes(group = sim_ID), alpha =0.85) +# Plot drift (cumulative delta over time)geom_line(aes(y = x_0 + drift), linetype ="dashed", color ="blue") +# Adjust color scale to have distinguishable colorsscale_color_brewer(palette ="Paired") +theme(legend.title =element_blank(),legend.position ="none" ) +labs(title ="Random Walk with 0 drift - 10 simulations") # Add titlep_10sims
100 runs
We may grow the number of simulations we compute. Let us run 100 simulations and depict the results. Now all simulations will be depicted using the same color, since there are no paletters with 100 different clearly distinguishable colors and the point is simply to show to what extent the range of values change and if there are areas with higher concentration of values than others.
We can see that at the beginning of the process there is a higher density of interesecting curves, since all random walks start at \(x_0=0\). As t increases, the variance of the process grows linearly, as shown by the equation \(\sigma_{RWalk}^2 = t \, \sigma_w^2\)
# Set seed for reproducibilityset.seed(160)# Number of simulationsn_sims =100# Example for 10 simulationsn <-200delta <-0x_0_<-0noise_var <-1run_100_nodrift <-simulate_n_Rwalks(n_sims, n, delta, x_0, noise_var)# Exctract drift (same drift for every sim)drift <- run_100_nodrift %>%filter(sim_ID ==1) %>%pull(drift)
# Plottingp_100sims <-ggplot(run_100_nodrift, aes(x = t, y = x_t, group = sim_ID)) +# Plot all simulationsgeom_line(aes(group = sim_ID), alpha =0.1) +# Plot drift (cumulative delta over time)geom_line(aes(y = x_0 + drift), linetype ="dashed", color ="blue") +# Adjust color scale to have distinguishable colorsscale_color_brewer(palette ="Paired") +theme(legend.title =element_blank(),legend.position ="none" ) +labs(title ="Random Walk with 0 drift - 100 simulations") # Add titlep_100sims
10000 runs
To wrap-up, let us increase the number of simulations to 10000. This yields an ensemble of simulations sufficient that it assymptotically matches the values predicted in the theory part of the session:
# Set seed for reproducibilityset.seed(160)# Number of simulationsn_sims =10000# Example for 10 simulationsn <-200delta <-0x_0_<-0noise_var <-1run_10000_nodrift <-simulate_n_Rwalks(n_sims, n, delta, x_0, noise_var)# Exctract drift (same drift for every sim)drift <- run_10000_nodrift %>%filter(sim_ID ==1) %>%pull(drift)
# Plottingp_10000sims <-ggplot(run_10000_nodrift, aes(x = t, y = x_t, group = sim_ID)) +# Plot all simulationsgeom_line(aes(group = sim_ID), alpha =0.1) +# Plot drift (cumulative delta over time)geom_line(aes(y = x_0 + drift), linetype ="dashed", color ="blue") +theme(legend.title =element_blank(),legend.position ="none" ) +labs(title ="Random Walk with 0 drift - 10000 simulations") # Add titlep_10000sims
Ok, as expected the variance of the process increases over time. Let us check that, indeed, the random varable corresponding to each point in time is a normally distributed random variable. For that, let us produce cross sections of the data at the following points:
t = {25, 50, 75, 100, 125, 150, 175, 200}
The code below produces cross-sections of the simulation data at these time steps and stores each in a position of the list cross_data. It also uses the same for loop to add the points of each cross-section to the graph. * As you can see, each red point is the interesection of a simulated random walk with the lines \(t = constant\) chosen for the cross section.
# Points in time at which we want the cross-sectionst_cross <-seq(25, 200, by=25)cross_data <-list() # list object to hold all subset dataframesfor (i inseq_along(t_cross)) {# Subset data df_cross <- run_10000_nodrift %>%filter(t == t_cross[[i]]) cross_data[[i]] <- df_cross# Add cross section to the previous graph p_10000sims <- p_10000sims +geom_point(data = cross_data[[i]], aes(x = t, y = x_t), color ="red", alpha =0.25)}p_10000sims
To make the figure more clear, we are going to:
Do away with the random walk lines and retain only the data corresponding to each cross-section.
Compute the density curve corresponding to each cross-section and plot it on the graph.
Do not worry if you do not understand every detail of the code below, just focus on the graph.
scale_fac <-200p_sections_densities <-ggplot(run_10000_nodrift, aes(x = t, y = x_t, group = sim_ID))# Loop through each of the dataframes contained in the list "cross_data"for (i inseq_along(t_cross)) { df_cross <- cross_data[[i]] t_i <- t_cross[[i]] density_data <-density(df_cross$x_t)# Convert density data to a dataframe for plotting# The scaling factor can be adjusted for visual clarity df_density <-data.frame(x = density_data$x,y = density_data$y * scale_fac + t_i) p_sections_densities <- p_sections_densities +# Add cross sectiongeom_point(data = df_cross, aes(x = t, y = x_t), color ="red", alpha =0.25) +# Add density graphgeom_path(data = df_density, aes(x = y, y = x, group =1), color ="blue") +# Shade density curve# Use aes_string to avoid lazy evaluation issuesgeom_ribbon(data = df_density,aes_string(x =NULL, y ="x", xmin = t_i, xmax ="y", group ="1"),fill ="blue", alpha =0.3)}
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
p_sections_densities
Conclusion
The cross-sections above result in normal density graphs. To inspect this accurately, we may compute the qq-plot. The perfect match should be sufficient to convince you that the distribution is normal. Let us check for t = 200 as an example:
df_cross_200 <- cross_data[[8]]# Adjusted QQ-plotqq_plot_adjusted <-ggplot(df_cross_200, aes(sample = x_t)) +geom_qq(alpha =0.5, size =0.5) +# Added transparency and reduced dot sizegeom_qq_line(color ="red") +# Made the theoretical line redlabs(title ="Q-Q plot of x_t values at t=200")qq_plot_adjusted
We may even compute the variance of the points at this cross-section. From the theory, we expect the variance at \(t=200\):
\[
\begin{equation*}
\begin{aligned}
&\sigma_{RW}^2 = t \, \sigma_w = t \,\cdot1 = t\\
&\sigma_{RW}^2|_{t=200} = 200
\end{aligned}
\end{equation*}
\] Now let us compute the variance of our sample:
var(df_cross_200$x_t)
[1] 200.8279
You can see that it matches the theory really well.
Multi-run simulation - Random walk (with drift)
Now let us do the same but we will add a drift term. We will go directly to the 10000 runs case.
10000 runs
Due to the similarity with the previous case, the code here is provided with no further comment.
Just one comment about the last graph. The variances are now still normally distributed, with the exact same variances as before, since we set the exact same seed and this guarantees repeatibility. What now changes is the mean of these distributions, which is given by the drift. As we proved in the theory, the expected value of a random walk process with drift and\(x_0=0\) is indeed \(\delta t\), due to the linearity of the expectation, the fact that \(\delta t\) is deterministic (that is, \(E[\delta t] = \delta t\)) and the fact that white noise has expectation equal to 0.
\[
E[x_t] = E[\delta t + \sum_{j=1}^{t}w_t] = \delta t
\]
# Set seed for reproducibilityset.seed(160)# Number of simulationsn_sims =10000# Example for 10 simulationsn <-200delta <-0.36# Add driftx_0_<-0noise_var <-1run_10000_drift <-simulate_n_Rwalks(n_sims, n, delta, x_0, noise_var)# Exctract drift (same drift for every sim)drift <- run_10000_drift %>%filter(sim_ID ==1) %>%pull(drift)
# Plottingp_10000sims <-ggplot(run_10000_drift, aes(x = t, y = x_t, group = sim_ID)) +# Plot all simulationsgeom_line(aes(group = sim_ID), alpha =0.1) +# Plot drift (cumulative delta over time)geom_line(aes(y = x_0 + drift), linetype ="dashed", color ="blue") +theme(legend.title =element_blank(),legend.position ="none" ) +labs(title ="Random Walk with 0 drift - 10000 simulations") # Add title
# Points in time at which we want the cross-sectionst_cross <-seq(25, 200, by=25)cross_data <-list() # list object to hold all subset dataframesfor (i inseq_along(t_cross)) {# Subset data df_cross <- run_10000_drift %>%filter(t == t_cross[[i]]) cross_data[[i]] <- df_cross# Add cross section to the previous graph p_10000sims <- p_10000sims +geom_point(data = cross_data[[i]], aes(x = t, y = x_t), color ="red", alpha =0.25)}p_10000sims
scale_fac <-200p_sections_densities <-ggplot(run_10000_drift, aes(x = t, y = x_t, group = sim_ID))# Loop through each of the dataframes contained in the list "cross_data"for (i inseq_along(t_cross)) { df_cross <- cross_data[[i]] t_i <- t_cross[[i]] density_data <-density(df_cross$x_t)# Convert density data to a dataframe for plotting# The scaling factor can be adjusted for visual clarity df_density <-data.frame(x = density_data$x,y = density_data$y * scale_fac + t_i) p_sections_densities <- p_sections_densities +# Add cross sectiongeom_point(data = df_cross, aes(x = t, y = x_t), color ="red", alpha =0.25) +# Add density graphgeom_path(data = df_density, aes(x = y, y = x, group =1), color ="blue") +# Shade density curve# Use aes_string to avoid lazy evaluation issuesgeom_ribbon(data = df_density,aes_string(x =NULL, y ="x", xmin = t_i, xmax ="y", group ="1"),fill ="blue", alpha =0.3)}p_sections_densities +# Plot drift (cumulative delta over time)geom_line(aes(y = x_0 + drift), linetype ="dashed", color ="blue")